{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Semantic Scholar Loader in llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_hub.semanticscholar.base import SemanticScholarReader\n",
    "import os\n",
    "import openai\n",
    "from llama_index.llms import OpenAI\n",
    "from llama_index.query_engine import CitationQueryEngine\n",
    "from llama_index import (\n",
    "    VectorStoreIndex,\n",
    "    StorageContext,\n",
    "    load_index_from_storage,\n",
    "    ServiceContext,\n",
    ")\n",
    "from llama_index.response.notebook_utils import display_response\n",
    "\n",
    "# initialize the SemanticScholarReader\n",
    "s2reader = SemanticScholarReader()\n",
    "\n",
    "# initialize the service context\n",
    "openai.api_key = os.environ[\"OPENAI_API_KEY\"]\n",
    "service_context = ServiceContext.from_defaults(\n",
    "    llm=OpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**`Final Response:`** Large language models have limitations in terms of their training cost and computational resources [1]. While they can be efficient once trained, generating content from a trained model can still consume significant resources [1]. Techniques like model distillation can help reduce the cost of these models [1]. Additionally, increasing the size of language models may not necessarily improve their performance on long-tail knowledge or rare instances [3]. Scaling up models alone may not be sufficient to achieve high accuracy on specific types of questions [3]. There is also a need to modify the training objective or increase the number of training epochs to encourage memorization and focus on salient facts [4]. It is important to be cautious in how we talk about large language models, avoiding anthropomorphism and recognizing their limitations [5]."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "---"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**`Source Node 1/6`**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** 35028af6-85ea-4f55-a5b4-bfe11778cbb2<br>**Similarity:** 0.8679221353278955<br>**Text:** Source 1:\n",
       "consume signiﬁcant resources during training, they can be surprisingly efﬁcient once tr...<br>**Metadata:** {'title': 'Language Models are Few-Shot Learners', 'venue': 'Neural Information Processing Systems', 'year': 2020, 'paperId': '6b85b63579a916f705a8e10a49bd8d849d91b1fc', 'citationCount': 14032, 'openAccessPdf': None, 'authors': ['Tom B. Brown', 'Benjamin Mann', 'Nick Ryder', 'Melanie Subbiah', 'J. Kaplan', 'Prafulla Dhariwal', 'Arvind Neelakantan', 'Pranav Shyam', 'Girish Sastry', 'Amanda Askell', 'Sandhini Agarwal', 'Ariel Herbert-Voss', 'Gretchen Krueger', 'T. Henighan', 'Rewon Child', 'A. Ramesh', 'Daniel M. Ziegler', 'Jeff Wu', 'Clemens Winter', 'Christopher Hesse', 'Mark Chen', 'Eric Sigler', 'Mateusz Litwin', 'Scott Gray', 'Benjamin Chess', 'Jack Clark', 'Christopher Berner', 'Sam McCandlish', 'Alec Radford', 'Ilya Sutskever', 'Dario Amodei'], 'externalIds': {'DBLP': 'journals/corr/abs-2005-14165', 'ArXiv': '2005.14165', 'MAG': '3030163527', 'CorpusId': 218971783}}<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "---"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**`Source Node 2/6`**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** e5f116b6-5fe9-4020-90c6-59ad6d28d0f1<br>**Similarity:** 0.8679221353278955<br>**Text:** Source 2:\n",
       "Our work focuses on the ﬁrst approach (scaling compute and parameters together,\n",
       "by stra...<br>**Metadata:** {'title': 'Language Models are Few-Shot Learners', 'venue': 'Neural Information Processing Systems', 'year': 2020, 'paperId': '6b85b63579a916f705a8e10a49bd8d849d91b1fc', 'citationCount': 14032, 'openAccessPdf': None, 'authors': ['Tom B. Brown', 'Benjamin Mann', 'Nick Ryder', 'Melanie Subbiah', 'J. Kaplan', 'Prafulla Dhariwal', 'Arvind Neelakantan', 'Pranav Shyam', 'Girish Sastry', 'Amanda Askell', 'Sandhini Agarwal', 'Ariel Herbert-Voss', 'Gretchen Krueger', 'T. Henighan', 'Rewon Child', 'A. Ramesh', 'Daniel M. Ziegler', 'Jeff Wu', 'Clemens Winter', 'Christopher Hesse', 'Mark Chen', 'Eric Sigler', 'Mateusz Litwin', 'Scott Gray', 'Benjamin Chess', 'Jack Clark', 'Christopher Berner', 'Sam McCandlish', 'Alec Radford', 'Ilya Sutskever', 'Dario Amodei'], 'externalIds': {'DBLP': 'journals/corr/abs-2005-14165', 'ArXiv': '2005.14165', 'MAG': '3030163527', 'CorpusId': 218971783}}<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "---"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**`Source Node 3/6`**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** 3bd5a072-739a-4d4d-bd2d-30b8a50e26f1<br>**Similarity:** 0.864251829100195<br>**Text:** Source 3:\n",
       "small accu-\n",
       "racy gains.An alternative idea would be to increase the\n",
       "diversity of the pr...<br>**Metadata:** {'title': 'Large Language Models Struggle to Learn Long-Tail Knowledge', 'venue': 'International Conference on Machine Learning', 'year': 2022, 'paperId': '75f7e9e2b59fb640ef9d1dff94097175daf46c4d', 'citationCount': 35, 'openAccessPdf': None, 'authors': ['Nikhil Kandpal', 'H. Deng', 'Adam Roberts', 'Eric Wallace', 'Colin Raffel'], 'externalIds': {'DBLP': 'journals/corr/abs-2211-08411', 'ArXiv': '2211.08411', 'DOI': '10.48550/arXiv.2211.08411', 'CorpusId': 253522998}}<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "---"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**`Source Node 4/6`**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** 8ee81016-da93-4d9a-b038-f47020260f85<br>**Similarity:** 0.864251829100195<br>**Text:** Source 4:\n",
       "All of the LMs that we study do limited epochs,\n",
       "as it is generally seen as preferable t...<br>**Metadata:** {'title': 'Large Language Models Struggle to Learn Long-Tail Knowledge', 'venue': 'International Conference on Machine Learning', 'year': 2022, 'paperId': '75f7e9e2b59fb640ef9d1dff94097175daf46c4d', 'citationCount': 35, 'openAccessPdf': None, 'authors': ['Nikhil Kandpal', 'H. Deng', 'Adam Roberts', 'Eric Wallace', 'Colin Raffel'], 'externalIds': {'DBLP': 'journals/corr/abs-2211-08411', 'ArXiv': '2211.08411', 'DOI': '10.48550/arXiv.2211.08411', 'CorpusId': 253522998}}<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "---"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**`Source Node 5/6`**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** 17bbc1ce-f975-4ef2-a0c5-9110ade5225f<br>**Similarity:** 0.8627260872607259<br>**Text:** Source 5:\n",
       "Well, this is ﬁne\n",
       "as long as there is no possibility of anyone as-\n",
       "signing more weight ...<br>**Metadata:** {'title': 'Talking About Large Language Models', 'venue': 'arXiv.org', 'year': 2022, 'paperId': '3eed4de25636ac90f39f6e1ef70e3507ed61a2a6', 'citationCount': 43, 'openAccessPdf': None, 'authors': ['M. Shanahan'], 'externalIds': {'ArXiv': '2212.03551', 'DBLP': 'journals/corr/abs-2212-03551', 'DOI': '10.48550/arXiv.2212.03551', 'CorpusId': 254366666}}<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "---"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**`Source Node 6/6`**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** 48150b03-0525-498f-8601-c4698954a7db<br>**Similarity:** 0.8627260872607259<br>**Text:** Source 6:\n",
       "Acknowledgments\n",
       "Thanks to Toni Creswell, Richard Evans, Chris-\n",
       "tos Kaplanis, Andrew Lam...<br>**Metadata:** {'title': 'Talking About Large Language Models', 'venue': 'arXiv.org', 'year': 2022, 'paperId': '3eed4de25636ac90f39f6e1ef70e3507ed61a2a6', 'citationCount': 43, 'openAccessPdf': None, 'authors': ['M. Shanahan'], 'externalIds': {'ArXiv': '2212.03551', 'DBLP': 'journals/corr/abs-2212-03551', 'DOI': '10.48550/arXiv.2212.03551', 'CorpusId': 254366666}}<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "query_space = \"large language models\"\n",
    "full_text = True\n",
    "# be careful with the total_papers when full_text = True\n",
    "# it can take a long time to download\n",
    "total_papers = 50\n",
    "\n",
    "persist_dir = (\n",
    "    \"./citation_\" + query_space + \"_\" + str(total_papers) + \"_\" + str(full_text)\n",
    ")\n",
    "\n",
    "if not os.path.exists(persist_dir):\n",
    "    # Load data from Semantic Scholar\n",
    "    documents = s2reader.load_data(query_space, total_papers, full_text=full_text)\n",
    "    index = VectorStoreIndex.from_documents(documents, service_context=service_context)\n",
    "    index.storage_context.persist(persist_dir=persist_dir)\n",
    "else:\n",
    "    index = load_index_from_storage(\n",
    "        StorageContext.from_defaults(persist_dir=persist_dir),\n",
    "        service_context=service_context,\n",
    "    )\n",
    "\n",
    "# initialize the citation query engine\n",
    "query_engine = CitationQueryEngine.from_args(\n",
    "    index,\n",
    "    similarity_top_k=3,\n",
    "    citation_chunk_size=512,\n",
    ")\n",
    "\n",
    "query_string = \"limitations of using large language models\"\n",
    "\n",
    "# query the citation query engine\n",
    "response = query_engine.query(query_string)\n",
    "display_response(\n",
    "    response, show_source=True, source_length=100, show_source_metadata=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**`Final Response:`** The efficacy numbers of the COVID-19 vaccines are as follows:\n",
       "\n",
       "- NVX-CoV2373: 49% efficacy against the B.1.351 variant, increasing to 60% when excluding HIV-positive individuals [1].\n",
       "- Ad26.COV2-S: 72% efficacy against PCR-confirmed infection in the USA, reduced to 66% efficacy in Latin America and 57% efficacy in South Africa [1].\n",
       "- AZD1222: Did not demonstrate protection against mild to moderate B.1.351-induced COVID-19 [1].\n",
       "- BNT162b2: Elicited antibodies with neutralizing activity against B.1.1.7 and P.1 variants [1].\n",
       "- CoronaVac: 50% efficacy against symptomatic infection [1].\n",
       "- Sinopharm (BBIBP-CorV): 78% efficacy against COVID-19 [5].\n",
       "- Novavax (NVX-CoV2373): 89% efficacy against symptomatic COVID-19 [5].\n",
       "- VECTOR (EpiVacCorona): No data available [5].\n",
       "\n",
       "Note: These efficacy numbers are based on the provided sources and may not represent the most up-to-date information."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "---"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**`Source Node 1/6`**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** 4e5f66ec-7455-436c-ad7b-db666bfa0f1f<br>**Similarity:** 0.8624234672546093<br>**Text:** Source 1:\n",
       "NVX-CoV2373 \n",
       "showed an efficacy of 49% against the \n",
       "B.1.351 variant in the prevention o...<br>**Metadata:** {'title': 'Progress of the COVID-19 vaccine effort: viruses, vaccines and variants versus efficacy, effectiveness and escape', 'venue': 'Nature reviews. Immunology', 'year': 2021, 'paperId': 'b8b7b90263b1168d9466feb99ce2ce1efa7514b3', 'citationCount': 603, 'openAccessPdf': 'https://www.nature.com/articles/s41577-021-00592-1.pdf', 'authors': ['J. Tregoning', 'Katie E. Flight', 'Sophie L. Higham', 'Ziyin Wang', 'B. F. Pierce'], 'externalIds': {'PubMedCentral': '8351583', 'DOI': '10.1038/s41577-021-00592-1', 'CorpusId': 236968006, 'PubMed': '34373623'}}<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "---"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**`Source Node 2/6`**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** f0144cc7-f35b-49c7-aca4-030bed69ee1e<br>**Similarity:** 0.8624234672546093<br>**Text:** Source 2:\n",
       "617.2) variant.A significant \n",
       "decrease in neutralizing antibody titre \n",
       "has been seen fo...<br>**Metadata:** {'title': 'Progress of the COVID-19 vaccine effort: viruses, vaccines and variants versus efficacy, effectiveness and escape', 'venue': 'Nature reviews. Immunology', 'year': 2021, 'paperId': 'b8b7b90263b1168d9466feb99ce2ce1efa7514b3', 'citationCount': 603, 'openAccessPdf': 'https://www.nature.com/articles/s41577-021-00592-1.pdf', 'authors': ['J. Tregoning', 'Katie E. Flight', 'Sophie L. Higham', 'Ziyin Wang', 'B. F. Pierce'], 'externalIds': {'PubMedCentral': '8351583', 'DOI': '10.1038/s41577-021-00592-1', 'CorpusId': 236968006, 'PubMed': '34373623'}}<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "---"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**`Source Node 3/6`**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** ddceeaeb-a91f-403b-a4ed-8f7cc8307ccb<br>**Similarity:** 0.8616244247348551<br>**Text:** Source 3:\n",
       "The only valid way to compare vaccines directly is \n",
       "in head-to-head efficacy trials, wh...<br>**Metadata:** {'title': 'Progress of the COVID-19 vaccine effort: viruses, vaccines and variants versus efficacy, effectiveness and escape', 'venue': 'Nature reviews. Immunology', 'year': 2021, 'paperId': 'b8b7b90263b1168d9466feb99ce2ce1efa7514b3', 'citationCount': 603, 'openAccessPdf': 'https://www.nature.com/articles/s41577-021-00592-1.pdf', 'authors': ['J. Tregoning', 'Katie E. Flight', 'Sophie L. Higham', 'Ziyin Wang', 'B. F. Pierce'], 'externalIds': {'PubMedCentral': '8351583', 'DOI': '10.1038/s41577-021-00592-1', 'CorpusId': 236968006, 'PubMed': '34373623'}}<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "---"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**`Source Node 4/6`**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** 0976ea2c-4d05-4201-8ca1-a585053b6756<br>**Similarity:** 0.8616244247348551<br>**Text:** Source 4:\n",
       "population studied and prevalence \n",
       "of SARS-CoV-2 variants at the time of the \n",
       "trial, it...<br>**Metadata:** {'title': 'Progress of the COVID-19 vaccine effort: viruses, vaccines and variants versus efficacy, effectiveness and escape', 'venue': 'Nature reviews. Immunology', 'year': 2021, 'paperId': 'b8b7b90263b1168d9466feb99ce2ce1efa7514b3', 'citationCount': 603, 'openAccessPdf': 'https://www.nature.com/articles/s41577-021-00592-1.pdf', 'authors': ['J. Tregoning', 'Katie E. Flight', 'Sophie L. Higham', 'Ziyin Wang', 'B. F. Pierce'], 'externalIds': {'PubMedCentral': '8351583', 'DOI': '10.1038/s41577-021-00592-1', 'CorpusId': 236968006, 'PubMed': '34373623'}}<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "---"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**`Source Node 5/6`**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** 21168cc5-8587-4a61-9c0c-4ebad48e2653<br>**Similarity:** 0.8593642969779912<br>**Text:** Source 5:\n",
       "Although differences in how the \n",
       "clinical trials were set up make comparison \n",
       "between v...<br>**Metadata:** {'title': 'Progress of the COVID-19 vaccine effort: viruses, vaccines and variants versus efficacy, effectiveness and escape', 'venue': 'Nature reviews. Immunology', 'year': 2021, 'paperId': 'b8b7b90263b1168d9466feb99ce2ce1efa7514b3', 'citationCount': 603, 'openAccessPdf': 'https://www.nature.com/articles/s41577-021-00592-1.pdf', 'authors': ['J. Tregoning', 'Katie E. Flight', 'Sophie L. Higham', 'Ziyin Wang', 'B. F. Pierce'], 'externalIds': {'PubMedCentral': '8351583', 'DOI': '10.1038/s41577-021-00592-1', 'CorpusId': 236968006, 'PubMed': '34373623'}}<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "---"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**`Source Node 6/6`**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** 344121c1-d02a-4149-87c1-4c682d5d7dc1<br>**Similarity:** 0.8593642969779912<br>**Text:** Source 6:\n",
       "laboratory \n",
       "confirmed COVID-19 \n",
       "within \n",
       "6 months after first dose≥18 years old 9 months...<br>**Metadata:** {'title': 'Progress of the COVID-19 vaccine effort: viruses, vaccines and variants versus efficacy, effectiveness and escape', 'venue': 'Nature reviews. Immunology', 'year': 2021, 'paperId': 'b8b7b90263b1168d9466feb99ce2ce1efa7514b3', 'citationCount': 603, 'openAccessPdf': 'https://www.nature.com/articles/s41577-021-00592-1.pdf', 'authors': ['J. Tregoning', 'Katie E. Flight', 'Sophie L. Higham', 'Ziyin Wang', 'B. F. Pierce'], 'externalIds': {'PubMedCentral': '8351583', 'DOI': '10.1038/s41577-021-00592-1', 'CorpusId': 236968006, 'PubMed': '34373623'}}<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "query_space = \"covid 19 vaccine\"\n",
    "query_string = \"List the efficacy numbers of the covid 19 vaccines\"\n",
    "full_text = True\n",
    "# be careful with the total_papers when full_text = True\n",
    "# it can take a long time to download\n",
    "total_papers = 50\n",
    "\n",
    "persist_dir = (\n",
    "    \"./citation_\" + query_space + \"_\" + str(total_papers) + \"_\" + str(full_text)\n",
    ")\n",
    "\n",
    "if not os.path.exists(persist_dir):\n",
    "    # Load data from Semantic Scholar\n",
    "    documents = s2reader.load_data(query_space, total_papers, full_text=full_text)\n",
    "    index = VectorStoreIndex.from_documents(documents, service_context=service_context)\n",
    "    index.storage_context.persist(persist_dir=persist_dir)\n",
    "else:\n",
    "    index = load_index_from_storage(\n",
    "        StorageContext.from_defaults(persist_dir=persist_dir),\n",
    "        service_context=service_context,\n",
    "    )\n",
    "\n",
    "# initialize the citation query engine\n",
    "query_engine = CitationQueryEngine.from_args(\n",
    "    index,\n",
    "    similarity_top_k=3,\n",
    "    citation_chunk_size=512,\n",
    ")\n",
    "\n",
    "# query the citation query engine\n",
    "response = query_engine.query(query_string)\n",
    "display_response(\n",
    "    response, show_source=True, source_length=100, show_source_metadata=True\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
